Coherence-based Partial Exact Recovery 
Condition for OMP/OLS 

C. Herzet*, C. Soussen, J. Idier, and R. Gribonval 



o. 

CN . Abstract 
> 

o 
o 
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We address the exact recovery of the support of a fc-sparse vector with Orthogonal Matching Pursuit 
(OMP) and Orthogonal Least Squares (OLS) in a noiseless setting. We consider the scenario where 



worst-case necessary condition for their success in k steps. Our result is based on the coherence /j, of 
the dictionary and relaxes Tropp's well-known condition fx < l/(2fc — 1) to the case where OMP/OLS 
q , have a partial knowledge of the support. 



> 
CO 
00 
CN 

C I recovery 



Index Terms 

Orthogonal Matching Pursuit; Orthogonal Least Squares; coherence; fc-step analysis; exact support 



I. Introduction 

Sparse representations aim at describing a signal as the combination of a few elementary signals (or 
! atoms) taken from an overcomplete dictionary A. In particular, in a noiseless setting, one wishes to find 



the vector with the smallest number of non-zero elements, satisfying a set of linear constraints, that is 

min ||x||o subject to Ax = y, (1) 
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where A £ R mxn , x £ R n , y £ M m . Problem (1) is usually NP-hard [1], that is accessing to the solution 
requires to sweep over all possible supports for x. 

In order to circumvent this bottleneck, suboptimal (but tractable) algorithms have been proposed 
in the literature. Among the most popular approaches, one can mention the procedures based on a 
relaxation of the £q pseudo-norm (e.g., Basis Pursuit [2], FOCUSS [3]) and the so-called "greedy pursuit" 
algorithms, e.g., Matching Pursuit (MP) [4], Orthogonal Matching Pursuit (OMP) [5], Orthogonal Least 
Squares (OLS) [1], [6]. However, the suboptimal nature of these algorithms raises the question of their 
performance. In particular, if y = Ax*, under which conditions can one ensure that a suboptimal 
algorithm recovers x* from y? The goal of this paper is to provide novel elements of answer to this 
question for OMP and OLS. 

OMP has been widely studied in the recent years, including worst case [7], [8] and probabilistic 
analyses [9]. The existing exact recovery analyses of OMP were also adapted to several extensions of 
OMP, namely regularized OMP [8], weak OMP [10], and Stagewise OMP [11]. Although OLS has been 
known in the literature for a few decades (often under different names [12]), exact recovery analyses of 
OLS remain rare for two reasons. First, OLS is significantly more time consuming than OMP, therefore 
discouraging the choice of OLS for "real-time" applications, like in compressive sensing. Secondly, the 
selection rule of OLS is more complex, as the projected atoms are normalized. This makes the OLS 
analysis more tricky. When the dictionary atoms are close to orthogonal, OLS and OMP have a similar 
behavior, as emphasized in [10]. On the contrary, for correlated dictionary (e.g., in inverse problems), 
their behavior significantly differ and OLS may be a better choice [13]. The above arguments motivate our 
analysis of both OMP and OLS although in the present paper, our low mutual coherence assumptions 
imply that the correlation between atoms is weak, therefore we do not exhibit difference of behavior 
between OMP and OLS. 

In [7], Tropp provided the first general analysis of OMP. More specifically, he derived a sufficient and 
worst-case necessary condition under which OMP is ensured to recover a /c-sparse vector with a given 
support, in k iterations. Recently, Soussen et al. [13] showed that Tropp's exact recovery condition (ERC) 
is also sufficient and worst-case necessary for OLS. 

A possible drawback of Tropp's ERC stands in its cumbersome evaluation, since it requires to solve a 
number of linear systems. Hence, Tropp proposed in [7] a stronger sufficient condition, easier to evaluate, 
guaranteeing the recovery of any /c-sparse vector (for any support) by OMP. His condition reads: 

" < W=rv (2) 
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where fj, is the dictionary coherence, which only involves inner products between the dictionary atoms 
(see Definition 2 below). Note that (2) is also a sufficient condition for OLS since (2) implies Tropp's 
ERC which, in turn, is a sufficient condition for OLS. On the other hand, Cai&Wang recently emphasized 
that (2) is a worst-case necessary condition in some sense [14]. 

At this point, let us stress that the conditions mentioned above are worst-case necessary, that is, 
OMP/OLS will fail for some y's (and some particular dictionaries for (2)) as soon as they are not 
satisfied. However, when these conditions are not verified, one can observe in practice that OMP/OLS 
often succeed in recovering x* for many other observation vectors. In this paper, we investigate the case 
where (2) is not necessarily satisfied, but OMP/OLS nevertheless select I atoms belonging to the support 
of x* during the first I iterations. Our work is in the continuity of [13], in which the authors extended 
Tropp's condition to the Z-th iteration of OMP and OLS. The resulting conditions are however rather 
complex and unpractical for numerical evaluation. In this paper, we derive a simpler (although stronger) 
condition based on the coherence of the dictionary. We show that 



is sufficient and worst-case necessary (in some sense) for the success of OMP/OLS in k steps when I 
atoms of the support have been selected during the first I iterations. 

II. Notations 

The following notations will be used in this paper. ( . , . ) refers to the inner product between vectors, 
|| . || and || . ||i stand for the Euclidean and the £± norms, respectively. denotes the pseudo-inverse of a 
matrix. For a full rank and undercomplete matrix, we have = (X T X)~ 1 X T where . T stands for the 
matrix transposition. When X is overcomplete, spark(X) denotes the minimum number of columns from 
X that are linearly dependent [15]. l p (resp p ) denotes the all-one (resp. all-zero) vector of dimension 
p. The letter Q denotes some subset of the column indices, and Xg is the submatrix of X gathering 
the columns indexed by Q. For vectors, xg denotes the subvector of x indexed by Q. We will denote 
the cardinality of Q as \Q\. We use the same notation to denote the absolute value of a scalar quantity. 
Finally, Pg = XgXg and Pg = I — Pg denote the orthogonal projection operators onto span(Xg) and 
span(Xg) , where span(X) stands for the column span of X, span(X)- 1 is the orthogonal complement 
of span(X) and I is the identity matrix whose dimension is equal to the number of rows in X. 
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III. OMPandOLS 

In this section, we recall the selection rules defining OMP and OLS. Throughout the paper, we will 
assume that the dictionary columns are normalized. 

First note that any vector x satisfying the constraint in (1) must have a support, say Q, such that 
rg = Pgy = m since y must belong to span(Ag). Hence, problem (1) can equivalently be rephrased 
as 

min|Q| subject to rg = m . (4) 

OMP and OLS can be understood as iterative procedures searching for a solution of (4) by sequentially 
updating a support estimate as 

Q = Qu{j}, (5) 

where 

{arg maxj | (aj, rg) | for OMP 
(6) 
argminj ||r QU {j}|| for OLS 

and a. L is the 2th column of A. More specifically, OMP/OLS add one new atom to the support at each 

iteration: OLS selects the atom minimizing the norm of the new residual rg^} whereas OMP picks the 

atom maximizing the correlation with the current residual. 

In the sequel, we will use a slightly different, equivalent, formulation of (6). Let us define 

3-i = Pg a !) (7) 

Ml l ^ (8) 
m otherwise. 

Hence, a^ denotes the projection of aj onto span(Ag)- 1 - whereas bj is a normalized version of a^. For 
simplicity, we dropped the dependence of a^ and bj on Q in our notations. However, when there is a 
risk of confusion, we will use ap (resp. bp) instead of a^ (resp. hi). With these notations, (6) can be 
re-expressed as 

{arg maxj I (a; , r g ) I for OMP 
(9) 
argmaxj |(bj, rg)| for OLS. 

The equivalence between (6) and (9) is straightforward for OMP by noticing that rg 6 span(Ag)- 1 . We 
refer the reader to [16] for a detailed calculation for OLS. 
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Throughout the paper, we will use the common acronym Oxx in statements that apply to both OMP 
and OLS. Moreover, we define the unifying notation: 

A f ki for OMP, 
Ci={ ! (10) 
y hi for OLS. 

Finally, we will use the notations A, B and C to refer to the matrices whose columns are made up of 
the a;'s, bj's and cYs, respectively. 

IV. Context and Main Result 
Let us assume that y is a linear combination of k columns of A, that is 

y = A Q ,x Q , with \Q*\ = k, Xl £ Vi G Q*. (11) 

The atoms (i £ Q*) will be referred to as the "true" atoms. We review hereafter different conditions 
ensuring the success of Oxx and present our main result. The definition of "success" that will be used 
throughout the paper is as follows. 

Definition 1 (Successful recovery) Oxx with y as input succeeds if and only if it selects atoms in Q* 
during the first k iterations. 

The notion of successful recovery may be defined in a weaker sense: Plumbley [17, Corollary 4] 
first pointed out that there exist problems for which "delayed recovery" occurs after more than k steps. 
Specifically, Oxx can select some wrong atoms during the first k iterations but ends up with a larger 
support including Q* with a number of iterations slightly greater than k. In the noise-free setting (for 
y £ span(Ag*)), all atoms not belonging to Q* are then weighted by in the solution vector. Recently, 
a delayed recovery analysis of OMP using restricted-isometry constants was proposed in [18] and then 
extended to the weak OMP algorithm (including OLS) in [10]. In the present paper, exactly k steps are 
performed, thus delayed recovery is considered as a recovery failure. 

Moreover, we make clear that in special cases where the Oxx selection rule yields multiple solutions 
including a wrong atom, that is 

max|(ci,r s )| = max|(cj,r s )|, (12) 

we consider that Oxx systematically takes a wrong decision. Hence, situation (12) always leads to a 
recovery failure. 

The first thoughtful theoretical analysis of OMP is due to Tropp, see [7, Theorems 3.1 and 3.10]. Tropp 
provided a sufficient and worst-case necessary condition for the exact recovery of any sparse vector with 
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a given support Q*. The derivation of a similar condition for OLS is more recent and is due to Soussen 
et al. in [13]. In the latter paper, the authors carried out a narrow analysis of both OMP and OLS at any 
iteration of the algorithm using specific recovery conditions depending not only on Q* but also on the 
current support Q, whereas Tropp's ERC only involves Q* and does not depend on the iteration. The 
main result in [13] reads: 

Theorem 1 (Soussen et al. 's Partial ERC [13, Theorem 3]) Assume that Ag. is full rank and let 
Q C Q* with \Q*\ = k, \Q\ = I. If Oxx with y G span(Ag*) as input selects atoms in Q during the 
first I iterations, and 

max || Ct„, Ci||i < 1, (13) 

then Oxx only selects atoms in Q*\Q during the k — I subsequent iterations. Conversely, if (13) does 
not hold, there exists y € span( Aq* ) for which OLS selects Q during the first I iterations and then a 
wrong atom j ^ Q* at the (I + l)th iteration. 

We note that (13), on its own, does not constitute a worst-case necessary condition for OMP if Q ^ 0. 
More specifically, as shown in [13], some additional "reachability" hypotheses are required for (13) to 
be a worst-case necessary condition for OMP. 

Interestingly, when Q = 0, one recovers Tropp's ERC [7]: 

max || At,* aj ||i < 1, (14) 

HQ" u 

which constitutes a sufficient and worst-case necessary condition for both OMP and OLS at the very first 
iteration. 

One drawback of Tropp's and Soussen et al. 's ERCs stands in their unpractical evaluation. Indeed, 
evaluating (13)-(14) requires to carry out a pseudo-inverse (and a projection for (13)) operation. Moreover, 
support Q* is unknown in practice. Hence, ensuring that Oxx will recover any fc-sparse vector requires 
to test whether (14) is met for all possible supports Q* of cardinality k (resp. to evaluate (13) for all Q* 
and for all Q C Q* of cardinality I). 

In order to circumvent this problem, stronger conditions, but easier to evaluate, have been proposed 
in the literature. We can mainly distinguish between two types of "practical" guarantees: the conditions 
based on restricted-isometry constants (RIC) and those based on the coherence of the dictionary (see 
Definition 2 below). 

The contributions [8], [19]-[22] provide RIC-based sufficient conditions for an exact recovery of the 
support in k steps by OMP. The most recent and tightest results are due to Maleh [21] and Mo&Shen 
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[22]. The authors proved that OMP succeeds in k steps if 5k+i < v ^ +1 > where S^+i is the (k + 1)-RIC 
of A. In [22, Theorem 3.2], the authors showed moreover that this condition is almost tight, i.e., there 
exists a dictionary A with 5k+i = ^= and a A;-term representation y for which OMP selects a wrong 
atom at the first iteration. Let us mention that, by virtue of Theorem 1, these results remain valid for 
OLS. 

On the other hand, Tropp derived in [7, Corollary 3.6] a sufficient condition for OMP, stronger than 
(14) but only based on the coherence of the dictionary A. 

Definition 2 The mutual coherence p of a dictionary A is defined as 

p, = maxKa^aj)). (15) 

Tropp's condition reads as in (2) and ensures that (14) is satisfied. Since (14) guarantees the success of 
OLS (Theorem 1 for iteration I = 0), (2) is also a sufficient condition for OLS. Moreover, Cai&Wang 
recently showed in [14, Theorem 3.1] that (2) is also worst-case necessary in the following sense: there 
exists (at least) one /c-sparse vector x* and one dictionary A with /j, = suc h that Oxx 1 cannot 

recover x* from y = Ax*. These results are summarized in the following theorem: 

Theorem 2 (/i-based ERC for Oxx [7, Corollary 3.6], [14, Theorem 3.1]) If (2) is satisfied, then Oxx 
succeeds in recovering any k-term representation. Conversely, there exists an instance of dictionary A 
and a k-term representation for which: (i) \i = 2 k-\ '' ® ® xx se ^ ects a wrong atom at the first iteration. 

In this paper, we extend the work by Soussen et al. and provide a coherence-based sufficient and 
worst-case necessary condition for the success of Oxx in k iterations provided that true atoms have been 
selected in the first I iterations. Our main result generalizes Theorem 2 to the case where I true atoms 
have been selected: 

Theorem 3 (/i-based Partial ERC for Oxx) Consider a k-term representation y G span( Ag* ). As- 
sume that, at iteration I < k, Oxx has selected I true atoms in Q*. If 

" < v^r-v <16) 

then Oxx exactly recovers Q* in k iterations. 

Conversely, there exists a dictionary A and a k-term representation y such that: (i) fi = 2 k~l-i ' ® 
Oxx selects true atoms during the first I iterations and then a wrong atom at the (I + l)th iteration. 

'and actually, any sparse representation algorithm. 
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The proof of this theorem is reported to sections V, VI and VII. More specifically, we show in section 
V (resp. section VI) that (16) is sufficient for the success of OMP (resp. OLS) during the last k — I 
iterations. The proof of this sufficient condition significantly differs for OMP and OLS. The result is 
shown for OMP by deriving an upper bound on Soussen et al. 's extended ERC as a function of the 
restricted isometry bounds of the projected dictionary. As for OLS, the proof is based on a connection 
between Soussen et al. 's ERC and the mutual coherence of the normalized projected dictionary B. 
Finally, in section VII we prove that (16) is worst-case necessary for Oxx in the sense specified in 
Theorem 3. The proof is common to both OMP and OLS. 

V. Sufficient condition for OMP at iteration I 

In this section, we prove the sufficient condition result of Theorem 3 for OMP. The result is a direct 
consequence of Theorem 4 stated below, which provides an upper bound on the left-hand side of (13) 
only depending on the coherence of the dictionary A: 

Theorem 4 Let Q c Q*, with \Q\ =1, \Q*\ = k. If 

" < (17) 



then 

max || A] 2 .v Q ai||i < (fc ~*^ ■ (18) 
The sufficient condition for OMP stated in Theorem 3 then derives from Theorem 4. We see that 

T^k < 1 

implies (13) and is therefore sufficient for the success of OMP in k iterations. Now, (19) is equivalent 
to (16) which proves the result. 

Before proving Theorem 4, we need to define some quantities characterizing the projected dictionary 
A appearing in the implementation of OMP (see (9)) and state some useful propositions. In the following 
definition, we generalize the concept of restricted isometry property (RIP) [23] to projected dictionaries, 
under the name projected RIP (P-RIP): 

Definition 3 Dictionary A satisfies the P-RIP(5 b 5 q i) if and only if\/Q',Q with \Q'\ = q, \Q\ = I, 
Q n Q! = 0, Vx Q - we have 

(I-^)IIxq'II 2 < l|A|x s ,|| 2 < (l + ^)||x Q ,|| 2 . (20) 
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The definition of the standard (asymmetric) restricted isometry constants corresponds to the tightest 
possible bounds when I = (see e.g., [24], [25]). For I > 1, 5 q j and S q ,l can be seen as (asymmetric) 
bounds on the restricted isometry constants of projected dictionaries. Note that S q ,l is not necessarily 
non-negative since the columns of A are not normalized (||ap|| < 1). Note also that many well-known 
properties of the standard restricted isometry constants (see [26, Proposition 3.1] for example) remain 
valid for <5 g ; and 5 qt i. 

The next proposition provides an upper bound on the left-hand side of (13) only depending on 6_ ql 
and S q ,V- 

Proposition 1 Let Q C Q*, with \ Q\ = I, \Q*\ = k. If 5 k _ l t < 1, then 

mgc || A^^IU < (fc - 2(1 2, L"^' |) ■ (2D 

The proof of Proposition 1 is reported to Appendix V. The next proposition provides some possible 
values for <5_ j and 5 q: i as a function of the coherence of the dictionary A: 

Proposition 2 If ji < 1/(1 — 1), r/ten A satisfies the P-RIP(5_ q l ,5 q j) with 

S q ,i = (9 - 1)M> (22) 



^ = + - 7,^ — ■ (23) 



1 - (Z - l)/x 

The proof of this result is reported to Appendix V. We are now ready to prove Theorem 4: 



Proof: (Theorem 4) We rewrite the right-hand side of (21) as a function of \i. From Proposition 2, 
we have that A satisfies the P-RIP^ ^qf) with constants defined in (22)-(23) as long as 

" < rh- <24) 

Now, we have // < l/(k — 1) by hypothesis, which implies n < 1/(1 — 1). Using (22) and (23), we 
calculate that: 

hi + hi _ , v?i _ M/f + 1) r9 « 

2 ^" l "l-(i-l) A t 1 ; 

1 " = 1 - (* - i - 1)/* - (26) 
_ l-(fc-2V-(fc-l)^ 2 

i-a-iv { n 

(/x + !)(!- (A; -l)/i) 
" ■ (28) 



December 3, 2012 



DRAFT 



10 



Therefore, the ratio in the right-hand side of (21) can be rewritten as 

$2,1 + $2,1 H 



(29) 



2(1 -&_,,,) 1 -(fc-lV 
According to (28), /i < l/(/c — 1) < 1/ (Z — 1) implies that 1 — 5 k _i [ > 0. Proposition 1 combined 
with (29) implies that (18) is met. ■ 

Before concluding this section, let us remark that unlike Theorem 1, Theorem 3 does not (explicitly) 
require all (m x A;)-submatrices Ag. to be full rank. However, this condition is implicitly enforced by 
(16). Indeed, as shown in [7, Lemma 2.3], 

" < k~r\ (30) 

implies that Ag. is full rank when \Q*\ = k. Hence, since k — 1 < 2k — I — 1, (16) also implies that 
any submatrix Ag* with \Q*\ = k is full rank. Finally, we remark that the full rankness of Ag* implies 
that the projected submatrices Ag*\ g involved in Theorem 4 are also full rank [13, Corollary 3]. 

VI. Sufficient condition for OLS at iteration I 

We now prove the sufficient condition for OLS stated in Theorem 3. The result is a consequence of 
Proposition 3 and Lemma 1 stated below. We first need to introduce the coherence of the normalized 
projected dictionary B: 

Definition 4 (Coherence of the normalized projected dictionary) 

^ols = maxmax i/be bS\i (31) 

\Q\=l 3 

The following proposition gives a sufficient condition on iif LS under which (13) is satisfied: 

Proposition 3 Let Q C Q*, with \Q\ = I, \Q*\ = k. Assume that Ag* is full rank. If [if LS < l/(2k - 
21-1), then 

max||BL n bi||i < 1. (32) 

Proof When bj = 0, the result is obvious. When hi ^ 0, apply [7, Corollary 3.6] (that is: if A has 
normalized columns and fi < l/(2k — 1) then Tropp's ERC is satisfied, i.e., VQ* such that \Q*\ = k, 
max^g* ||Ag.ai||i < 1) to the matrix B and to Q*\Q of size k — I. The atoms of Bg*\g are of unit 
norm (actually, Bg*\g is full rank) because Ag* is full rank [13, Corollary 3]. ■ 
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The next lemma provides a useful upper bound on nf LS as a function of the coherence \i of the 
dictionary A: 

Lemma 1 If n <l/l, then 

fi" < 03) 

The proof of this result is reported to Appendix B. The sufficient condition stated in Theorem 3 for OLS 
then follows from the combination of Proposition 3 and Lemma 1. Indeed, (16) implies fj, < l/(k — 1) < 
l/l since 2k — I — 1 = k — 1 + (k — I) > k — 1 > /. Hence, the result follows by first applying Lemma 1: 

»? LS « ^ < w^r- < 34 > 

and then Proposition 3, which implies that (32) is met. fi < l/(k—l) implies that the full rank assumption 
of Proposition 3 is met for any Q* of cardinality k [7, Lemma 2.3]. 

VII. Worst-case necessary condition for Oxx at iteration I 

Cai&Wang recently showed in [14, Theorem 3.1] that there exist dictionaries A with fi = ^rrj an d 
linear combinations y of k columns of A such that y has two distinct /c-sparse representations in A. In 
other words, if fi < is not satisfied, there exist instances of dictionaries such that no algorithm can 
univocally recover some /c-sparse representations. In the context of Oxx, their result can be rephrased as 
the following worst-case necessary condition: there exists a dictionary A with /i = an d a support 
Q*, with \Q*\ = k, such that Oxx selects a wrong atom at the first iteration. 

In this section, we derive a worst-case necessary condition in the case where Oxx has selected atoms 
in Q* during the first / iterations. We extend Cai&Wang's analysis and exhibit a scenario in which / 
true atoms are selected, then the Oxx residual after / iterations has two {k — Z)-term representations. Our 
result reads 

Theorem 5 ((16) is a worst-case necessary condition for Oxx) There exists a dictionary A with \i = 
2k-l-v a su PP ort Q* with |Q*| = k and y € span(Ag*), such that Oxx with y as input selects I atoms 
in Q* during the first I iterations and a wrong atom at the (Z + l)th iteration. 

To reach the result, we adopt a dictionary construction similar to Cai&Wang's in [14]. Let M € 
jj(2fc-Z)x(2fc-Z) b £ jjjg marr j x with ones on the diagonal and — 2 k-i-i elsewhere. M will play the role 
of the Gram matrix M = A T A. We will exploit the eigenvalue decomposition of M to construct the 
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dictionary A £ R( 2fc ' 1 ) x ( 2fc with the desired properties. Since M is symmetric, it can be expressed 
as 

M = UAU T , (35) 

where U (resp. A) is the unitary matrix whose columns are the eigenvectors (resp. the diagonal matrix of 
eigenvalues) of M. It is easy to check that M has only two distinct eigenvalues: 2 k-J-i w ^ tn multiplicity 
2k — I — 1 and with multiplicity one; moreover, the eigenvector associated to the null eigenvalue is 
equal to X^k-l- The eigenvalues are sorted in the decreasing order so that appears in the lower right 
corner of A. 

We define A £ r(2*-i-1)x(2*-Q as 

A = TU T , (36) 

where T £ R(2fc-t-i)x(2fc-t) is such that 

T(y) = j {U = h (37) 

otherwise. 

Note that T T T = A. Hence, A satisfies the hypotheses of Theorem 5 since 

A T A = UT T TU T = UAU T = M, (38) 

and therefore 

(a i ,a i ) = Vi/j. (39) 

2k-l-l rj 

Since M = A T A, we have Mx = ®2k-i if an d on ly if -Ax = Qzk-l-i- Moreover, since M has one 
single zero eigenvalue with eigenvector l 2 fc— the null-space of A is the one-dimensional space spanned 
by ^-2k-l- Therefore, any p < 2k — I columns of A are linearly independent, i.e., spark(A) = 2k — I. 

Before proceeding to the proof of Theorem 5, we need to define the concept of "reachability" of a 
subset Q: 

Definition 5 A subset Q is said to be reachable by Oxx if there exists y £ span(Ag) such that Oxx 
with y as input selects atoms in Q during the first \ Q\ iterations. 

The concept of reachability was first introduced in [13]. The authors showed that any subset Q with 
\Q\ < spark(A) — 2 is reachable by OLS, see [13, Lemma 3]. On the other hand, they emphasized that 
there exist dictionaries for which some subsets Q can never be reached by OMP, see [13, Example 1]. 
This scenario does however not occur for the dictionary defined in (36) as stated in the next lemma: 
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Lemma 2 Let A be defined as in (36) with I < k. Then any subset Q with \Q\ = I is reachable by Oxx. 

The proof of this result is reported to Appendix C. To prove Theorem 5, we also need the following 
technical lemma whose proof is reported to Appendix C: 

Lemma 3 Let A be defined as in (36) with I < k. Then, for any subset Q with \Q\ = I, there exists 
a vector y having two (k — l)-term representations with disjoint supports in the projected dictionary 

A r 1 r- Tn>2fc-Z-lx2fc-2Z 



C \S = C {l,...,2fc-/}\Q £ 

We are now ready to prove Theorem 5: 

Proof: {Theorem 5) Consider the dictionary A defined in (36) with I < k. Let Q be a subset of 
cardinality /, arbitrarily chosen (say, the first / atoms of the dictionary). We will exhibit a subset Q* D Q 
for which the result of Theorem 5 holds. 

We first apply Lemma 2: there exists an input yi G span(Ag) for which Oxx selects all atoms in Q 
during the first / iterations. Then, we apply Lemma 3: there exists a vector y2 having two (k — /)-term 
representations in the projected dictionary C\ g. We will denote their respective supports by Q\ and Q2 
with Q 1 nQ 2 = 0. 

By virtue of [13, Lemma 15], Oxx with y = yi + ey 2 as input selects the same atoms (i.e., Q) as 
with yi as input during the first / iterations as long as e > is sufficiently small. Moreover, the selection 
rule (9) indicates that the atom &j selected at iteration / + 1 satisfies: 

j G arg max | (c;, Pgy) | = arg max | (cj, y 2 ) | , (40) 
1 i 

since Pgy = ePgy2 = ey2- Now, we set Q* in such a way that j ^ Q*: 

Q*={ QUQl lfjGQ2 ' (41) 
I QUQ 2 if j G Qi. 

To complete the proof, it is easy to check that y = yi + ey 2 G span(Ag*) because yi G span(Ag) and 
y 2 G span(C Q ,\g) = span(A Q ,\g) C A Q ». ■ 

VIII. Conclusions 

The sufficient and worst-case necessary condition we derived for the success of Oxx after the first / 
iterations have been completed reads fi < 2 k~i~i anc ^ relaxes the coherence-based results by Tropp [7] 
and Cai&Wang [14] corresponding to the case I = 0. 

Our condition is obviously pessimistic since it is a worst-case condition for all possible supports of 
cardinality /. In comparison, the conditions we elaborated in [13] are sharper (although significantly more 
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complex) and they are dedicated to a single support of size I. The latter conditions are indeed rather 
unpractical since they depend on the true support which is unknown. In practice, they shall be evaluated 
for all possible pairs of complete/partial supports of dimension k and I, and each evaluation requires 
a pseudo-inverse computation. A compromise between the pessimistic coherence condition and those 
elaborated in [13] would be to adapt our mutual coherence results to the cumulative coherence [7], and the 
weak ERC condition [7], [27], [28] (also referred to as the Neumann ERC in [29]). The latter conditions 
are intermediate conditions at iteration between the mutual coherence condition \i < I /(2k — 1) and 
Tropp's ERC. Their computation remains simple as only inner products between the dictionary atoms 
are involved. It would therefore be definitely interesting to study how this type of condition evolve when 
Oxx has recovered / atoms of the support. This is part of our future work. 

In this paper, we did also not investigate the case where the observed vector y is corrupted by some 
additive noise. This problem has been addressed in different contributions of the recent literature, see 
e.g., [30], [31], and is interesting on its own. The extension of the proposed partial condition to noisy 
settings is part of our ongoing work. 

Appendix A 
Proof of the results of section V 

This section contains the proofs of Propositions 1 and 2 together with some useful technical lemmas. 

Lemma 4 Assume A satisfies the P-RIP(5 2 1,82,1) an d let 

OMP A max max J /gfl & fl\ J _ (42) 
\Q\=l ijLj 

Then, we have 

tfMP < (43) 

Proof: By definition of 62,1 and S 2 1 we must have for all Q, Q' with | Q\ = I, | Q' \ = 2 and Q'n Q = 0: 

l + hi > Ama*(A^A S 0, (44) 
l-*2,i< A^A^AqO, (45) 
where \ max (WL) (resp. A. m j n (M)) denotes the largest (resp. smallest) eigenvalue of M. Moreover, if 



Q' = {i, j}, it is easy to check that the eigenvalues of Ag,Ag/ can be expressed 

\/*T a \ ll a i|| 2 + ll a 7 IP ^ ^ 

a(a^a q o = a— a — ^ , 
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where 

A = Jm\\ 2 + HM 2 ) 2 + 4(^,^)2 - ||a 4 p ||a,P) (46) 



a i || 2 -||a i || 2 )2 + 4(^,^)2. (47) 
Hence 

X max (AQ,A Q ') - A miri (Ag,A Q ') = A > 2|(aj,aj)|. 

Using (44)-(45), we thus obtain Q: 

$j,j+£ 2 ,i>2Ka i ,a f )|. (48) 

Now, this inequality also holds if i € Q or j G Q since the right hand-side of (48) is then equal to zero. 
The result then follows from the definition of f.if MP . ■ 



Lemma 5 Let \Q\=l and Q! n Q" = 0, then Vu G Rl Q "l, 

||A£,A e „u|| <tf MP ^\Qi\\Q"\\\vi\\. (49) 

Proof: We have 



|A£,A e »u||= /^( ai ,A Q ,u) 2 (50) 



/E(E u i(^%)) 2 ( 51 ) 

«6Q' jGQ" 



^/E(E KIK^%)D 2 (52) 

V *6S' ieQ" 



<Mp MP v1eil|u||i (53) 



</iP MP ^QW1l|u||. (54) 



Using Lemmas 4 and 5, we can now prove Propositions 1 and 2: 



December 3, 2012 



DRAFT 



16 



Proof: (Proposition ^ Q*, the following inequalities hold: 



l A <2*\Q fi i||i < Vk- I ||A^^ Q ai|| 2 , (55) 

\Jk — I I, r t ~n 
< YZT§ — -\\ A Q*\Q a ih> ( 56 ) 



'-k-l,l 
k-l 



< -r-j— tf MP , (57) 

where the first inequality follows from the equivalence of norms; the second from RIC properties (see 

[26, Proposition 3.1]); the third from Lemma 5 and the fourth from Lemma 4. ■ 

Proof: (Proposition 2) First, notice that A satisfies the P-RIPC^ >^g,o) V q with 

Ko = £q,o = (q - i)m, (59) 
see e.g., [7, Lemma 2.3]. Hence, (22) is a consequence of the following inequalities: 

HP^A^XQ-H 2 < ||A Q ,x Q ,|| 2 < (l + ^ i0 )||x Q ,|| 2 . (60) 
Lower bound (23) may derived by noticing that 

||P^A Q ,x s ,|| 2 = ||A s ,x s ,|| 2 - ||P s A Q ,x s ,|| 2 , (61) 

and 

l|A S <x s ,|| 2 > (l-^ i0 )||x Q ,|| 2 , (62) 

||P Q A s <x s ,|| 2 = \\(AlfAlA Q ,x Q ,f (63) 



IIA^Aq'Xq'H 2 
l — dj 



Ll.O 

< ^^ifcSll! , (65) 

1 - 5i,0 

where inequality (64) follows from standard relationships between the RIC properties of A and transforms 
of A,andl — 5 l0 > is a consequence of hypothesis p < 1/(1 — 1) [7, Lemma 2.3]; (65) is a consequence 
of Lemma 5. ■ 

Appendix B 
Proof of the results of section VI 

Proof: (Lemma 1) The proof is recursive. Obviously, the result holds for I = since Pq LS = p. 
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Let Q with \Q\ = I > 1 and consider TZ such that Q = TZ U {i} with \TZ\ = I - 1. According to [13, 
Lemma 5], if j ^ Q, we have the orthogonal decomposition 

bf = Vj bf + (bf , bf) bf. (66) 

Moreover, assumption \i < 1/Z implies that Ag^}, A-^u^-} and A^j,;} are full column rank as 
families of at most Z + 1 atoms [7, Lemma 2.3] which in turn implies that ay, af and af- are 
nonzero [13, Corollary 3]. Therefore, \\bf\\, ||b^|| and \\bf\\ are all of unit norm, and then (66) yields 



rjj = ±\/l- {hf , bf ) 2 . If j and f Q, it follows that 



= (bf,b^)-(bf,bf)(b^,bf) 



Majorizing the inner products \(bf , hf}\ by fi^Jf and using (33), we get: 



,,OLS 

n-i 

1 - U OLS 



(69) 



^ ^ = ^ (70) 

1 — (Z — l)/i — /x 1 — l/i 



leading to (33). 



Appendix C 
Proof of the results of section VII 

In this appendix, we provide a proof of Lemma 2. We use the notation TZ instead of Q to denote the 
current support. This change of notation is done to avoid confusion: in the rest of the paper, we have 
|Q| = Z whereas in this appendix, the support cardinality may differ from Z. 

We first need to prove the following technical lemma: 

Lemma 6 Let A be defined as in (36). Then, we have for all TZ with \1Z\ < 2k — I and i,j £ TZ, i ^ j: 

(af,af) = -fi-^lf^AlAn)- 1 !^ (71) 



a 



, || =l- / x 2 l^ | (A^A 7 e)^ 1 l| 7 ,|. (72) 

Proof: First recall that spark(A) = 2k — I (see section VII). Therefore, A^ is full rank when 
\TZ\ <2k — I and af- reads 

af = P^a; = a; - P w a* = a { - A^A^A^)" 1 A^a*. (73) 
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Using this expression, we have 




) = (a^ay) - af A n {A^A n ) 1 A^a j , 



1 - af AniA^Any 1 A^a*. 



(74) 



(75) 



Taking into account that the inner product between any pair of atoms is equal to — fi by definition of 
M = A r A, we obtain the result. 

■ 

Proof: (Lemma 2) We prove a result slightly more general than the statement of Lemma 2: for the 
dictionary defined as in (36), any subset TZ with p = \TZ\ < 2k — I — 2 can be reached by Oxx. Lemma 
2 corresponds to the case p = I (p < 2k — I — 2 is always satisfied as long as / < k). 

The result is true for OLS by virtue of [13, Lemma 3] which states that any subset TZ of an arbitrary 
dictionary A is reachable as long as \TZ\ < spark(A) — 2. In particular, the latter condition is verified 
by the dictionary A and the subset TZ considered here since spark(A) = 2k — I and \TZ\ < 2k — I — 2 
by hypothesis. 

We prove hereafter that the result is also true for OMR Without loss of generality, we assume that the 
elements of TZ correspond to the first p atoms of A (the analysis performed hereafter remains valid for 
any other support 1Z of cardinality p since the content of the Gram matrix A^A^ is constant whatever 
the support TZ: see (39)). For arbitrary values of e2,...,e p > 0, we define the following recursive 
construction: 

. yi = ai, 

• y P +i = y P + e P +ia p +i 

(y p+ i implicitly depends on €2, ... , e p+ i). We show by recursion that for all p G {1, . . . , 2k — I — 2}, there 
exist £2, • • • , e p > such that OMP with the dictionary defined as in (36) and y p as input successively 
selects ai , . . . , a p during the first p iterations (in particular, the selection rule (9) always yields a unique 
maximum). 

The statement is obviously true for yi = ai. Assume that it is true for y p (p < 2k — I — 2) with some 
62, • . • , e p > (these parameters will remain fixed in the following). According to [13, Lemma 15], there 
exists e p+ i > such that OMP with y p +i = y p + e p+ ia p+ i as input selects the same atoms as with y p 
during the first p iterations, i.e., ai, . . . , a p are successively chosen. At iteration p, the current active set 
reads TZ = {1, . . . , p} and the corresponding residual takes the form 



r-R - e p+i a p+1 . 



(76) 
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Thus, a p+ i is chosen at iteration p + 1 if and only if 

|<af,a^ +1 >| < ||a^ +1 || 2 Vt^p + 1. (77) 

Now, 1 72. | = p < 2fc — / by hypothesis, then Lemma 6 applies. Using (71)-(72), it is easy to see that 
(77) is equivalent to 

A* + 2 M 2 lJ , (A£A w )~ 1 l P < 1. (78) 

Since \i = 2 k~i~i < pTI < p^T' we nave ~ (p ~ 1)/") > 0- Then, [7, Lemma 2.3] and ||l p || 2 = p 
yield: 

Using the majoration fj, < l/(p + 1), it follows that: 

/x + 2 / u 2 lJ(A^A 7e )- 1 l p < /i + 1 _ 2 { ^_ (80) 

/ l + (p + l)M \ 

= Hw^iW (81) 

< — ^— [ — t- | = 1 (82) 

which proves that the condition (78), and then (77) is met. OMP therefore recovers the subset TZU{p+l} = 
{l,...,p+l}. 

■ 

Proof: (Lemma 3) Using Lemma 6, we notice that C\q = /3A\g for some f3 > since ||a,|| does 
not depend on i and Cj ^ 0. Defining v = \2k-2u we obtain 

C\ Q v = /3A\ Q v (83) 
= /3Al 2fc _ ? = /3P^Al 2(t _ ; = Oafc-i-i, (84) 

since \2k-l belongs to the null-space of A. 

Let us partition the elements of v = l 2 fc_ z into two subsets Q\ U Q2 with Q\ n Q2 = and | Qi | = 
IQ2I = & — ^> an d define y = Cg\glfc_;. According to (84), y rereads — Cg 2 \glfc„z, therefore y has 
two (k — Z)-sparse representations with disjoint supports in C\g. 
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